Building a morphological and syntactic lexicon by merging various linguistic resources
نویسندگان
چکیده
This paper shows how large-coverage morphological and syntactic NLP lexicons can be developed by interpreting, converting to a common format and merging existing lexical resources. Applied on Spanish, this allowed us to build a morphological and syntactic lexicon, the Leffe. It relies on the Alexina framework, originally developed together with the French lexicon Lefff. We describe how the input resources — two morphological and two syntactic lexicons — were converted into Alexina lexicons and merged. A preliminary evaluation shows that merging different sources of lexical information is indeed a good approach to improve the development speed, the coverage and the precision of linguistic resources.
منابع مشابه
The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French
In this paper, we introduce the Lefff , a freely available, accurate and large-coverage morphological and syntactic lexicon for French, used in many NLP tools such as large-coverage parsers. We first describe Alexina, the lexical framework in which the Lefff is developed as well as the linguistic notions and formalisms it is based on. Next, we describe the various sources of lexical data we use...
متن کاملA Linguistic Analysis of Conference Titles in Applied Linguistics
Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...
متن کاملA Linguistic Analysis of Conference Titles in Applied Linguistics
Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...
متن کاملMerging a Syntactic Resource with a WordNet: a Feasibility Study of a Merge between STO and DanNet
This paper presents a feasibility study of a merge between SprogTeknologisk Ordbase (STO), which contains morphological and syntactic information, and DanNet, which is a Danish WordNet containing semantic information in terms of synonym sets and semantic relations. The aim of the merge is to develop a richer, composite resource which we believe will have a broader usage perspective than the two...
متن کاملIdentifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy
Multi-word expressions constitute a significant portion of the lexicon of every natural language, and handling them correctly is mandatory for various NLP applications. Yet such entities are notoriously hard to define, and are consequently missing from standard lexicons and dictionaries. Multi-word expressions exhibit idiosyncratic behavior on various levels: orthographic, morphological, syntac...
متن کامل